Consumer electronics based smart technologies for enhanced terahertz healthcare having an integration of split learning with medical imaging

The proposed work contains three major contribution, such as smart data collection, optimized training algorithm and integrating Bayesian approach with split learning to make privacy of the patent data. By integrating consumer electronics device such as wearable devices, and the Internet of Things (IoT) taking THz image, perform EM algorithm as training, used newly proposed slit learning method the technology promises enhanced imaging depth and improved tissue contrast, thereby enabling early and accurate disease detection the breast cancer disease. In our hybrid algorithm, the breast cancer model achieves an accuracy of 97.5 percent over 100 epochs, surpassing the less accurate old models which required a higher number of epochs, such as 165.


Research gap and problem definition
The amalgamation of artificial intelligence (AI) and intelligent terahertz technology presents an intriguing frontier in the realm of consumer electronics, particularly within healthcare applications [33][34][35][36] .By combining AI's cognitive capabilities with terahertz's proficiency in non-invasive imaging, a groundbreaking healthcare system emerges as a plausible prospect.This fusion equips consumer electronic devices with the capacity to conduct advanced medical scans, allowing users to oversee their well-being in unprecedented ways.Ranging from the identification of early-stage irregularities to the monitoring of chronic conditions, this advancement ensures personalized insights and timely interventions, all conveniently accessible from one's residence [37][38][39][40] .The inclusion of AI enriches the diagnostic precision and interpretation of this system, rendering it user-friendly and dependable.The potential for wearables, smartphones, and other consumer electronics to function as health monitoring tools signifies a paradigmatic shift in proactive healthcare.By enhancing the accessibility and viability of healthcare data, individuals are empowered to take charge of their health in an unparalleled manner.Essentially, the AI-driven intelligent terahertz healthcare system tailored for consumer electronics envisions a future where empowerment and well-being seamlessly converge.The research gap is evident in two significant aspects, namely, the effective utilization of split learning for breast cancer prediction while incorporating data obtained from consumer wearable devices.The challenges that arise encompass the collection of data and the integration of the Internet of Things (IoT) and wearable smart consumer electronics (CE) devices for health monitoring and sensor purposes.These devices possess the capability to promptly accumulate patient/consumer information, including vital signs such as heart rate and temperature, in real-time and subsequently transmit this data to the healthcare system for analysis.Furthermore, the subsequent step entails the development of a novel model that integrates split learning with terahertz medical imaging for the early detection of breast cancer.This model is characterized by its high throughput and secure technology [41][42][43][44][45][46] .

Proposed method
The entirety of the proposed model's architecture is composed of three distinct components.The initial component entails data collection, which is executed via the utilization of smart consumer electronics-based data collection employing an IoT-based system or a wearable sensor.The subsequent phase entails the provision of guidance and the subsequent evaluation and validation of the outcomes of the model.With regard to instruction, the EM optimized algorithm has been utilized, while the Bayesian-based split learning model has been employed for the anticipation of breast cancer.
Preprocessing for Data Analysis and Materials/Methods in the context of using terahertz (THz) imaging for breast can-cer identification involves several key steps: such as data collection and acquisition, preprocessing of data, feature extraction, and splitting.

Data acquisition
In Data Acquisition, the gather THz imaging data from recently removed mouse tumors, ensuring that the data is accurately recorded and properly labeled for breast cancer identification.This module is employed for the purpose of eliminating any form of noise or artifacts that may be present in the THz data, thereby guaranteeing that the dataset is devoid of any anomalies that might have a detrimental impact on the performance of the model.At present, there exists a necessity to identify a suitable feature, a requirement that is met by the feature extraction method.Consequently, it becomes imperative to ascertain the pertinent features or characteristics from the THz images, which can subsequently be utilized as input variables for the models.This may involve techniques such as edge detection or texture analysis.In the very next level, it has been dividing the dataset into training and testing sets to assess model performance.Given that the approach aims to reduce the demand for extensive training data, this splitting should be designed to optimize data uses (see Figs. 1 and 2).
The mathematical equation for data acquisition can be represented using the concept of sampling.Data acquisition typically involves sampling a continuous signal at discrete time intervals.The equation for data acquisition is often expressed as: where x[n] is the sampled data at the discrete time index n, x(t n ) represent the continuous signal x(t), sampled at a specific time t n , F(nT S ) is the function that samples the continuous signal x(t) at discrete time interval."n" is the index of the discrete samples; T S is the sample interval.This equation describes how continuous data is converted into a discrete form through sampling.The choice of the sampling interval T S is critical and is determined by the Nyquist-Shannon sampling theorem, which states that T S should be less than or equal to half the reciprocal of the highest frequency component present in the signal to avoid aliasing.

Preprocessing of the raw data
In this section, it has been discussing the pre-processing method, which contain from sampled data to normalized, scaled etc.Once it has been getting the data, the preprocessing encompasses various steps such as normalization, imputation, and scaling.The whole process is shown in Fig. 2, which started from raw data to the choosing an appropriate model.The mathematical equation for a common preprocessing step, min-max normalization, can be represented as: where x is the actual/ original data point, X normalized is the normalized data point, min(x) and max(x) are the minimum and maximum value in the data set.After getting the normalized value filter is used to remove the unwanted data from the signal.Here we have to choose which type of filter is used, it may be a low pass or high pass filter as per the requirement.Here a simple mathematical representation of a low pass filter is given where, y(t) is the filter signal, x(t) is the original signal, h(t) is the impulse response of the filter signal.

Feature extraction
Once the scaled data is obtained, it becomes necessary to identify the suitable feature from this data.Due to the presence of numerous parameters in the data, the PCA method is employed to identify the relevant data that can aid in predicting the healthcare status.In the subsequent discussion, we will elaborate on the step-by-step process of PCA for feature extraction and which is shown in Fig. 1.This figure tells detail about starting from choosing of raw data, filter process, feature extraction (PCA method), and finally considering appropriate training, testing and validation model.After the completion of the filtering stage, it is necessary to carry out the process of feature extraction.The purpose of feature extraction is to convert the data into a collection of pertinent features.Principal Component Analysis (PCA) is a widely used method for reducing the dimensionality of the data and extracting features.The mathematical formula for PCA involves the identification of the principal components through eigen decomposition.
where X is the original data, which represented as in the matrix form.W is the is the principal component (matrix of eigenvectors).Z is the is the transformed data having reduced dimensionality and represented in the form of matrix.PCA seeks to find W such that the transformed data Z retain the X-dimensionality.For the next step it has consider X as input data.

Implement with appropriate algorithm with split learning
In this section, we have amalgamated the split learning technique with the machine learning model to facilitate the early identification of breast cancer through the utilization of terahertz medical imaging.Within the machine learning model, we have employed the Bayesian approach to forecast the occurrence of breast cancer.The incorporation of split learning methodology ensures the privacy of sensitive and confidential patient information.Subsequently, an elaborate explanation of the newly proposed mathematical model and the intricate details of the split learning-based hybrid algorithm will be discussed below.
A newly proposed model, integrating with split learning for early detection breast cancer using terahertz medical imaging.
Training the machine learning model is of utmost importance to predict breast cancer accurately.The model's accuracy is heavily dependent on the quality of the training data.In this study, FFPE samples 22 were used as a reference point for the training purposes.However, the dehydration process can negatively impact the quality of the training data due to differences in shape between freshly excised samples and their FFPE counterparts.Once we train the machine learning model, now we can used the appropriate algorithm to predict the breast cancer.For the model terahertz (THz) imaging of the given mouse data is act as input.And we can used this data to the proposed supervised multinomial Bayesian with split learning for the identification of breast cancer.One more advantages of this model is, it requires only a small set of model parameters compare to other proposed models 22 .Detail of the mathematical model of the proposed algorithm can see module.Algorithm: Hybrid Training Procedure using split learning.Step by step process from collection of data to choosing appropriate model with split learning for predict the health-care condition, the whole method.

Initialize a population of solutions P.
Randomly generate an initial population of solutions in P.
if n population size P.size() then Evaluate the fitness of solution P(n) using Z(j).

end if end if end for
Output: Regression parameters [β (i) , α (i) ] for i = 1 to M.
Algorithm: Hybrid Training Procedure using split learning.

Development of split learning model
Mathematically formulating the split learning algorithm for predicting healthcare diseases necessitates the articulation of the fundamental elements and procedures of the algorithm in mathematical symbols.Here it has lies a comprehensive mathematical representation of the split learning procedure: As lastly, we have obtained the featured input data X, which is represented in a matrix of size n × m, where n is the number of samples (patients/ mouse) and m is the number of features (attributes/ parameters).Let Y represent the predicted value as output for the given input data X.So that the input X, in the form of function F c defined as where F c represents the input model function with parameters θ c .This model extracts features from the input data.
Similarly the F s the output function parameters θ s can defined as This model makes the final disease predictions based on the features extracted by the input model.The split learning process involves iterative communication rounds between the input and output models until convergence: In each iteration, the input model updates its features using the following equation: The output model takes these features and makes predictions: The input model updates its parameters using the predictions and a loss function Similarly the output model also updates its parameters using the same loss function: This process continues until convergence, where α is the learning rate.After achieving convergence, the ultimate pa-rameters of the global model are acquired through the process of the parameters of the model at the input and output function.Appropriate evaluation metrics can be utilized to evaluate the efficiency of the model, encompassing accuracy, precision, recall, and F1-score, amongst others.The aforementioned mathematical modeling offers a comprehensive framework for comprehending the split learning algorithm employed in healthcare disease prediction.

Testing process in the proposed model
In order to determine whether there is a significant difference between the average values of cancerous and noncancerous pixels within each test sample using our proposed model.The first step was to conduct a univariate t-test.This analysis used the first component of the low-dimensional vector per pixel, which was obtained from the output of the proposed algorithm.
The null hypothesis of this t-test assumes that there are equal means between the outputs of cancerous and non-cancerous pixels.Table 1 presents the results of the t-test, which showed that the p-values for all test samples were almost zero.This indicates that the null hypothesis can be rejected, confirming that there are significant differences in the mean values of the outputs of cancerous and non-cancerous pixels, as revealed by the t-test results.

Result
To confirm the substantial reduction in computational complexity achieved by the proposed algorithm during the training procedure, we conducted a comparative analysis between the results obtained from the proposed classifiers and other ones.
The results, as presented in Tables 1 and 2, clearly demonstrate about the choosing sample and finding out the area under cover using various model with our proposed model respectively.In order to be considered the THz image, only the tissue during its histopathology process has been taken into consideration.We have proposed an EM-algorithm for the method of data selection in training, and only data that surpass a certain threshold of reliability are utilized for the training.This allows us to use only a few parameters for our training process.
The experimental findings demonstrate that the recommended supervised regression models surpass current algorithms, including 1D MCMC and 2D EM, in all areas of concern.For instance, when employing the supervised polynomial regression approach, the areas under the ROC curves for cancer and muscle in Mouse 9B fresh increase from 90.68% to 92.71% and 71.35% to 86.18%, respectively.

Discussion
5][46] .These results represent a step forward in achieving optimal differentiation between cancerous and non-cancerous tissue within freshly excised BCS.However, in our proposed split learning model, it has been observed that the highest areas under the ROC curves are obtained among all the presented classifiers for all three categories.This illustrates the high efficiency and optimization of our model in terms of time.

Analysis of given model
The segmentation outcomes produced by the Bayesian model with the split learning are subsequently High value of ROC then other proposed model such as 3D polynomial regression, Supervised kernel method with those of the proposed kernel regression classifier, 1D MCMC and 2D EM model which is shown in Figs. 3, 4 and 5 (simulated by matlab version 2022a).Specifically, it becomes evident that while the split learning approach exhibits potential in detecting cancer and fat regions, it falls short in completely identifying the muscle region, as depicted in Figs. 4 and 5. Figure 3 has represented the Terahz image of the sample is fresh and cancer.In Figs. 4 and 5 contain the information for 3 region such as cancer, muscle and fat.In this figure, it has been found fitted model with different algorithms with our proposed mode having the split learning.It is imperative to acknowledge that the results of these models were obtained through the utilization of the most favorable segmentation thresholds  derived from each Receiver Operating Characteristic (ROC) curve.This approach emphasized the identification of cancer in all areas, with muscle or fat being the subsequent priority.The segmentation outcomes of the split learning model suggested in this research are subsequently compared to those of the SVM, kernel regression classifier in Figs. 4 and 5.
Split learning, a form of distributed machine learning, improves both accuracy and security by distributing the training of the model among different entities without the need to share raw data.In contrast to traditional techniques such as SVM and kernel methods, split learning enables advanced model training on decentralized data, ensuring privacy and minimizing the possibility of data breaches.This approach is particularly more effective in scenarios where data cannot be centralized due to privacy regulations or logistical issues, leading to improved model performance and robustness while maintaining data confidentiality.

Statistical validation of the proposed model
Statistical verification of breast identification and the suggested models is comprehensively explicated in Table 3, in that order, accompanied by corresponding learning curves depicted in Figs. 6 and 7.The training of the breast model was concluded after 100 epochs, attaining a plateau of 97.5% precision on the validation dataset.As for the other model (excluding the suggested model), training was terminated after 145 epochs, with the highest accuracy for the validation dataset transpiring during the 69th epoch.Consequently, the condition of the model at that juncture was employed for the purpose of analysis.After undergoing training, validation was conducted on both the breast and proposed models using real-world datasets.Accuracy, precision, recall, and F1-score for each class in the breast and proposed models are presented in Table 3.
We have calculate parameters such as True Positives, True negative, false positive and false negative which are called TP, TN, FP, and FN respectively.In this work, we have assumed that to considered TP as "µ", TN as "µ 1 " and FP as "Γ", FN as "Γ 1 ".Based on this value, it can be find all the four parameters likely Accuracy, precision, recall, and the F1-score which helps to find the performance of our model for prediction of breast cancer using the split learning.So it can calculate the above parameters accuracy, precision, recall, and the F1-score respectively as shown in the Table 3.
The breast model exhibited an overall accuracy of 97.5%.It is noteworthy that only one image containing a breast was misclassified as belonging to the "no-breast" category, while two images without breasts were mistakenly categorized.Inter-estingly, the breast model displayed exceptional performance in identifying images

Conclusion
In addition, it has been introduced a hybrid Bayesian approach with split learning, which helps to find cancer using THz imaging of available data samples.This algorithm produces multinomial Bayesian ordinal probit regression models to conduct classifications within the THz images.To determine the connection between terahertz features and their corresponding classification outcomes, the method employs two different notable advancement.One is the unique thing of the model having very few training parameters and secondly the addition of the split learning the prototype is highly secured.Additionally, the use of split learning contributes to the improvement of the effectiveness and safety of collaborative medical data in the early detection of breast cancer by utilizing terahertz medical imaging.This has the potential to accelerate advancements in this critical area of healthcare.Split learning integration with medical imaging improves the efficiency and accuracy of healthcare diagnosis and treatment.It enables collaborative training on decentralized data while maintaining data privacy, allowing neural network training for Brest cancer images without sharing patient data across hospitals.The proposed hybrid method uses a data quality-based adaptive averaging strategy to handle variations in annotated ground truth quality, ensuring accurate segmentation of given images.Additionally, EM training algorithm, SVM with split learning method overcomes performance drops from data heterogeneity, achieving comparable results toother algorithms such as SVM, kernel method.The training of the breast model was concluded after 100 epochs, attaining a plateau of 97.5% precision on the validation dataset.As for the other model (excluding the suggested model), training was terminated after 145 epochs, with the highest accuracy for the validation dataset transpiring during the 69th epoch.Consequently, the condition of the model at that juncture was employed for the purpose of analysis.After undergoing training, validation was conducted on both the breast and proposed models using real-world datasets.Moreover, these approaches demonstrate the suitability of split learning for collaborative learning in medical imaging and pave the way for future real-world implementations.

Future work
Apart from these two advantages the smart consumer electronics devices like IoT devices, wearable health monitors and sensors have possessed the capability to amass patient information in real-time with other parameters to the healthcare system for analysis, in tandem with terahertz imaging.Also, in future we may integrated Cloud computing resources and servers are crucial for storing and processing large volumes of medical data and performing distributed split learning tasks. https://doi.org/10.1038/s41598-024-58741-0www.nature.com/scientificreports/

Figure 1 .
Figure 1.Architecture of newly proposed healthcare prediction system using split learning method.

Figure 2 .
Figure 2.Step by step process from collection of data to choosing appropriate model with split learning for predict the health-care condition, the whole method.

Figure 3 .
Figure 3. THz image of a Sample mouse sample data for cancer, muscle and fat.

Figure 4 .
Figure 4. Comparison of the proposed split learning model with others having different samples.

Figure 5 .
Figure 5.Comparison of the split learning model with few other examples (a) Cancer, (b) Muscle, (c) FAT.

Figure 6 .
Figure 6.Considering the breast cancer data set and using the proposed split learning based model to find training and validation accuracy.

Figure 7 .
Figure 7. Considering the breast cancer data set and using the proposed split learning based model to find training and validation loss.

Table 1 .
Three different data sett-test result with the significance value is 0.05.

Table 2 .
Comparison of our proposed mode with other various model for the mouse 9B fresh data, considering area under the ROC curve.

Table 3 .
Accuracy, precision, recall, and the F1-score of the breast detection for proposed model evaluated on the real-world data.